Strings in Python

What is a string?

A "string" is a series of characters of arbitrary length. Strings are immutable - they cannot be changed once created. When you modify a string, you automatically make a copy and modify the copy.


In [2]:
s1 = 'Godzilla'
print s1, s1.upper(), s1


Godzilla GODZILLA Godzilla

String literals

A "literal" is essentially a string constant, already spelled out for you. Python uses either on output, but that's just for formatting simplicity.


In [3]:
"Godzilla"


Out[3]:
'Godzilla'

Single and double quotes

Generally, a string literal can be in single ('), double ("), or triple (''') quotes. Single and double quotes are equivalent - use whichever you prefer (but be consistent). If you need to have a single or double quote in your literal, surround your literal with the other type, or use the backslash to escape the quote.


In [4]:
"Godzilla's a kaiju."


Out[4]:
"Godzilla's a kaiju."

In [5]:
'Godzilla\'s a kaiju.'


Out[5]:
"Godzilla's a kaiju."

In [6]:
'We call him... "Godzilla".'


Out[6]:
'We call him... "Godzilla".'

Triple quotes (''')

Triple quotes are a special form of quoting used for documenting your Python files (docstrings). We won't discuss that type here.

Raw strings

Raw strings don't use any escape character interpretation. Use them when you have a complicated string that you don't want to clutter with lots of backslashes. Python puts them in for you.


In [7]:
print('This is a\ncomplicated string with newline escapes in it.')


This is a
complicated string with newline escapes in it.

In [8]:
print(r'This is a\ncomplicated string with newline escapes in it.')


This is a\ncomplicated string with newline escapes in it.

Strings and numbers


In [12]:
x=int('122', 3)
x+1


Out[12]:
18

String objects

String objects are just the string variables you create in Python.


In [13]:
kaiju = 'Godzilla'
print(kaiju)


Godzilla

In [14]:
kaiju


Out[14]:
'Godzilla'

Note the print() call shows no quotes, while the simple variable name did. That is a Python output convention. Just entering the name will call the repr() method, which displays the value of the argument as Python would see it when it reads it in, not as the user wants it.


In [15]:
repr(kaiju)


Out[15]:
"'Godzilla'"

In [16]:
print(repr(kaiju))


'Godzilla'

String operators

When you read text from a file, it's just that - text. No matter what the data represents, it's still text. To use it as a number, you have to explicitly convert it to a number.


In [17]:
one = 1
two = '2'
print one, two, one + two


1 2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-93b38a7083e9> in <module>()
      1 one = 1
      2 two = '2'
----> 3 print one, two, one + two

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [18]:
one = 1
two = int('2')
print one, two, one + two


 1 2 3

In [19]:
num1 = 1.1
num2 = float('2.2')
print num1, num2, num1 + num2


1.1 2.2 3.3

You can also do this with hexadecimal and octal numbers, or any other base, for that matter.


In [20]:
print int('FF', 16)
print int('0xff', 16)
print int('777', 8)
print int('0777', 8)
print int('222', 7)
print int('110111001', 2)


255
255
511
511
114
441

If the conversion cannot be done, an exception is thrown.


In [21]:
print int('0xGG', 16)


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-14d162c91079> in <module>()
----> 1 print int('0xGG', 16)

ValueError: invalid literal for int() with base 16: '0xGG'

Concatenation


In [22]:
kaiju1 = 'Godzilla'
kaiju2 = 'Mothra'
kaiju1 + ' versus ' + kaiju2


Out[22]:
'Godzilla versus Mothra'

Repetition


In [23]:
'Run away! ' * 3


Out[23]:
'Run away! Run away! Run away! '

String keywords

in()

NOTE: This particular statement is false regardless of how the statement is evaluated! :^)


In [24]:
'Godzilla' in 'Godzilla vs Gamera'


Out[24]:
True

String functions

len()


In [25]:
len(kaiju)


Out[25]:
8

String methods

Remember - methods are functions attached to objects, accessed via the 'dot' notation.

Basic formatting and manipulation

capitalize()/lower()/upper()/swapcase()/title()

In [26]:
kaiju.capitalize()


Out[26]:
'Godzilla'

In [27]:
kaiju.lower()


Out[27]:
'godzilla'

In [28]:
kaiju.upper()


Out[28]:
'GODZILLA'

In [29]:
kaiju.swapcase()


Out[29]:
'gODZILLA'

In [30]:
'godzilla, king of the monsters'.title()


Out[30]:
'Godzilla, King Of The Monsters'
center()/ljust()/rjust()

In [31]:
kaiju.center(20, '*')


Out[31]:
'******Godzilla******'

In [32]:
kaiju.ljust(20, '*')


Out[32]:
'Godzilla************'

In [33]:
kaiju.rjust(20, '*')


Out[33]:
'************Godzilla'
expandtabs()

In [34]:
tabbed_kaiju = '\tGodzilla'
print('[' + tabbed_kaiju + ']')


[	Godzilla]

In [35]:
print('[' + tabbed_kaiju.expandtabs(16) + ']')


[                Godzilla]
join()

In [36]:
' vs '.join(['Godzilla', 'Hedorah'])


Out[36]:
'Godzilla vs Hedorah'

In [37]:
','.join(['Godzilla', 'Mothra', 'King Ghidorah'])


Out[37]:
'Godzilla,Mothra,King Ghidorah'
strip()/lstrip()/rstrip()

In [38]:
'   Godzilla   '.strip()


Out[38]:
'Godzilla'

In [39]:
'xxxGodzillayyy'.strip('xy')


Out[39]:
'Godzilla'

In [40]:
'    Godzilla   '.lstrip()


Out[40]:
'Godzilla   '

In [41]:
'    Godzilla   '.rstrip()


Out[41]:
'    Godzilla'
partition()/rpartition()

In [42]:
battle = 'Godzilla x Gigan'
battle.partition(' x ')


Out[42]:
('Godzilla', ' x ', 'Gigan')

In [43]:
battle = 'Godzilla and Jet Jaguar vs. Gigan and Megalon'
battle.partition(' vs. ')


Out[43]:
('Godzilla and Jet Jaguar', ' vs. ', 'Gigan and Megalon')

In [44]:
battle = 'Godzilla vs Megalon vs Jet Jaguar'
battle.partition('vs')


Out[44]:
('Godzilla ', 'vs', ' Megalon vs Jet Jaguar')

In [45]:
battle = 'Godzilla vs Megalon vs Jet Jaguar'
battle.rpartition('vs')


Out[45]:
('Godzilla vs Megalon ', 'vs', ' Jet Jaguar')
replace()

In [46]:
battle = 'Godzilla vs Mothra'
battle.replace('Mothra', 'Anguiras')


Out[46]:
'Godzilla vs Anguiras'

In [47]:
battle = 'Godzilla vs a monster and another monster'
battle.replace('monster', 'kaiju', 2)


Out[47]:
'Godzilla vs a kaiju and another kaiju'

In [48]:
battle = 'Godzilla vs a monster and another monster and yet another monster'
battle.replace('monster', 'kaiju', 2)


Out[48]:
'Godzilla vs a kaiju and another kaiju and yet another monster'
split()/rsplit()

In [49]:
battle = 'Godzilla vs King Ghidorah vs Mothra'
battle.split(' vs ')


Out[49]:
['Godzilla', 'King Ghidorah', 'Mothra']

In [51]:
kaijus = 'Godzilla,Mothra,King Ghidorah'
kaijus.split(',')


Out[51]:
['Godzilla', 'Mothra', 'King Ghidorah']

In [52]:
kaijus = 'Godzilla Mothra King Ghidorah'
kaijus.split()


Out[52]:
['Godzilla', 'Mothra', 'King', 'Ghidorah']

In [53]:
kaijus = 'Godzilla,Mothra,King Ghidorah,Megalon'
kaijus.rsplit(',', 2)


Out[53]:
['Godzilla,Mothra', 'King Ghidorah', 'Megalon']
splitlines()

In [54]:
kaijus_in_lines = 'Godzilla\nMothra\nKing Ghidorah\nEbirah'
print(kaijus_in_lines)


Godzilla
Mothra
King Ghidorah
Ebirah

In [55]:
kaijus_in_lines.splitlines()


Out[55]:
['Godzilla', 'Mothra', 'King Ghidorah', 'Ebirah']

In [56]:
kaijus_in_lines.splitlines(True)


Out[56]:
['Godzilla\n', 'Mothra\n', 'King Ghidorah\n', 'Ebirah']
zfill()

In [57]:
age_of_Godzilla = 60
age_string = str(age_of_Godzilla)
print(age_string, age_string.zfill(5))


('60', '00060')

String information

isXXX()

In [58]:
print('Godzilla'.isalnum())
print('*Godzilla*'.isalnum())
print('Godzilla123'.isalnum())


True
False
True

In [59]:
print('Godzilla'.isalpha())
print('Godzilla123'.isalpha())


True
False

In [60]:
print('Godzilla'.isdigit())
print('60'.isdigit())


False
True

In [61]:
print('SpaceGodzilla'.isspace())
print('   '.isspace())


False
True

In [62]:
print('Godzilla'.islower())
print('godzilla'.islower())


False
True

In [63]:
print('Godzilla'.isupper())
print('GODZILLA'.isupper())


False
True

In [64]:
print('Godzilla vs Mothra'.istitle())
print('Godzilla X Mothra'.istitle())


False
True
count()

In [65]:
monsters = 'Godzilla and Space Godzilla and MechaGodzilla'
print 'There are ', monsters.count('Godzilla'), ' Godzillas.'
print 'There are ', monsters.count('Godzilla', len('Godzilla')), ' pseudo-Godzillas.'


There are  3  Godzillas.
There are  2  pseudo-Godzillas.
startswith()/endswith()

In [66]:
king_kaiju = 'Godzilla'
print king_kaiju.startswith('God')
print king_kaiju.endswith('lla')
print king_kaiju.startswith('G')
print king_kaiju.endswith('amera')


True
True
True
False
find()/index()/rfind()/rindex()

In [67]:
kaiju_string = 'Godzilla,Gamera,Gorgo,Space Godzilla'
print 'The first Godz is at position', kaiju_string.find('Godz')
print 'The second Godz is at position', kaiju_string.find('Godz', len('Godz'))


The first Godz is at position 0
The second Godz is at position 28

In [42]:
kaiju_string.index('Minilla')


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-42-31bdc624ada2> in <module>()
----> 1 kaiju_string.index('Minilla')

ValueError: substring not found

In [44]:
kaiju_string.rindex('Godzilla')


Out[44]:
28

Advanced features

decode()/encode()/translate()

Used to convert strings to/from Unicode and other systems. Rarely used in science code.

String formatting

Similar to formatting in C, FORTRAN, etc.. There is a lot more to this than I am showing here.


In [111]:
kaiju = 'Godzilla'
age = 60
print '%s is %d years old.' % (kaiju, age)


Godzilla is 60 years old.

The string module

The string module is the Python equivalent of "junk DNA" in living organisms. It's been around since the beginning, but many of its functions have been superseded by evolution. But some ancient code still relies on it, so they leave the old parts in....

For modern code, the string module does have some useful constants and functions.


In [68]:
import string

In [69]:
print string.ascii_letters
print string.ascii_lowercase
print string.ascii_uppercase


abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ

In [70]:
print string.digits
print string.hexdigits
print string.octdigits


0123456789
0123456789abcdefABCDEF
01234567

In [71]:
print string.letters
print string.lowercase
print string.uppercase


abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ

In [72]:
print string.printable
print string.punctuation
print string.whitespace


0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ 	

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
	
 

The string module also provides the Formatter class, which can be useful for sophisticated text formatting.

Regular Expressions

What is a regular expression?

Regular expressions ('regexps') are essentially a mini-language for describing string operations. Everything shown above with string methods and operators can be done with regular expressions. Most of the time, the regular expression verrsion is more concise. But not always more readable....

To use regular expressions, you have to import the 're' module.


In [73]:
import re

A very short, whirlwind tour of regular expressions

Scanning


In [3]:
kaiju_truth = 'Godzilla is the King of the Monsters. Ebirah is also a monster, but looks like a giant lobster.'
re.findall('Godz', kaiju_truth)


Out[3]:
['Godz']

In [18]:
print re.findall('(^.+) is the King', kaiju_truth)


['Godzilla']

For simple searches like this, using in() is typically easier. Regexps are by default case-sensitive.


In [21]:
print re.findall('\. (.+) is also', kaiju_truth)


['Ebirah']

In [39]:
print re.findall('(.+) is also a (.+)', kaiju_truth)[0]
print re.findall('\. (.+) is also a (.+),', kaiju_truth)[0]


('Godzilla is the King of the Monsters. Ebirah', 'monster, but looks like a giant lobster.')
('Ebirah', 'monster')

Changing


In [10]:
some_kaiju = 'Godzilla, Space Godzilla, Mechagodzilla'
print re.sub('Godzilla', 'Gamera', some_kaiju)
print re.sub('(?i)Godzilla', 'Gamera', some_kaiju)


Gamera, Space Gamera, Mechagodzilla
Gamera, Space Gamera, MechaGamera

And so much more...

You could spend a whole day (or more) just learning about regular expressions. But they are incredibly useful and powerful, especially in the all-to-frequent drudgery of munging files from one format to another.

Regular expressions can be internally compiled for speed.